An Alternative Method of Training Probabilistic LR Parsers

نویسندگان

  • Mark-Jan Nederhof
  • Giorgio Satta
چکیده

We discuss existing approaches to train LR parsers, which have been used for statistical resolution of structural ambiguity. These approaches are nonoptimal, in the sense that a collection of probability distributions cannot be obtained. In particular, some probability distributions expressible in terms of a context-free grammar cannot be expressed in terms of the LR parser constructed from that grammar, under the restrictions of the existing approaches to training of LR parsers. We present an alternative way of training that is provably optimal, and that allows all probability distributions expressible in the context-free grammar to be carried over to the LR parser. We also demonstrate empirically that this kind of training can be effectively applied on a large treebank.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generalized Probabilistic LR Parsing of Natural Language (Corpora) with Unification-Based Grammars

We describe work toward the construction of a very wide-coverage probabilistic parsing system for natural language (NL), based on LR parsing techniques. The system is intended to rank the large number of syntactic analyses produced by NL grammars according to the frequency of occurrence of the individual rules deployed in each analysis. We discuss a fully automatic procedure for constructing an...

متن کامل

The Lane Table Method Of Constructing LR(1) Parsers

The lane-tracing algorithm is a reduced-space LR(1) parser generation algorithm. The previous version of lane-tracing algorithm regenerates states involved in reduce/reduce conflict by employing the practical general method. In this paper we describe an alternative lane-tracing approach, which regenerates states based on the lane table method. We discuss the details of this new algorithm, study...

متن کامل

Head-Driven PCFGs with Latent-Head Statistics

Although state-of-the-art parsers for natural language are lexicalized, it was recently shown that an accurate unlexicalized parser for the Penn tree-bank can be simply read off a manually refined treebank. While lexicalized parsers often suffer from sparse data, manual mark-up is costly and largely based on individual linguistic intuition. Thus, across domains, languages, and tree-bank annotat...

متن کامل

Faster Generalized LR Parsing

Tomita devised a method of generalized LR (GLR) parsing to parse ambiguous grammars e ciently. A GLR parser uses linear-time LR parsing techniques as long as possible, falling back on more expensive general techniques when necessary. Much research has addressed speeding up LR parsers. However, we argue that this previous work is not transferable to GLR parsers. Instead, we speed up LR parsers b...

متن کامل

Eine Rekonstruktion der LR-Theorie zur Elimination von Redundanz mit Anwendung auf den Bau von ELR-Parsern

In this thesis, we present work on two problems from the field of LR parser construction, a family of syntax analysis techniques for context-free languages. In the first part, we show that the traditional LR parser construction technique produces parsers which are burdened with a substantial amount of systematic redundance. We develop a new and well-founded method which defines what we call gen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004